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Abstract 

As a first step in the search of an analytical study of mechanical denaturation of DNA in terms 
of the sequence, we study stable, stationary solutions in the discrete, finite and homogeneous 
Peyrard-Bishop DNA model. We find and classify all the stationary solutions of the model, as well 
as analytic approximations of them, both in the continuum and in the discrete limits. Our results 
explain the structure of the solutions reported by Theodorakopoulos et al. [Phys. Rev. Lett. 93, 
258101 (2004)] and provide a way to proceed to the analysis of the generalized version of the model 
incorporating the genetic information. 
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LEAD PARAGRAPH 



DNA, the molecule that constitutes the basis of the genetic code, is of ut- 
most importance. In particular, its mechanical properties are crucial as opening 
the double helix structure of DNA is needed to read the genetic code and for 
replication of the molecule for reproduction. The complete separation of the 
double helix is called replication, and can be achieved by heating or mechani- 
cally, by pulling the two strands of the molecule apart. We here address the 
mathematical description of mechanical denaturation in terms of a simple model. 
We determine and classify the solutions of the model equations and study their 
stability properties. We also provide an approximate but very accurate way to 
deal analytically with those solutions. Beyond mechanical features, our results 
are relevant for studies of the thermodynamic properties of the DNA chain, and 
may have genomic applications, in so far as mechanical denaturation experi- 
ments that give information about DNA composition can be modelled by our 
model and solutions. 
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I. INTRODUCTION 



Nonlinear models appear in many fields of science since the pioneering discoveries, almost 
50 years ago, of Fermi, Pasta and Ulam This work, in the field of physics, has led many 
scientist to use nonlinear models in the study of complex systems in other subjects. 
Nonlinear models entered into DNA physics with Englander and co-workers ^ (see Q for a 
review on nonlinear models of DNA), in 1980, when they modeled the dynamics of DNA with 
a sine-Gordon equation. Since then, a lot of work has been devoted to nonlinear excitations 
in DNA, both from the dynamics and the statistical mechanics points of view. Among this 
body of work, a particularly succesful model is the Peyrard-Bishop (PB) one 0, that will 
be our starting point in this paper. 

One problem of special interest in the framework of DNA was the thermal denaturation 
transition, which takes place at temperatures around 90°C, when the two strands of the 
DNA molecule separate. On the other hand, mechanical denaturation, that occurs when 
one of the strands of the molecule is separated from the other by pulling it in single molecule 
experiments, was achieved in the last few years [7]. In order to model these phenomena, 
most of the research done so far refers to homopolymers, i.e., homogeneous DNA molecules 
consisting entirely of A-T or C-G base pairs. When the issue under discussion is genomics, 
or gene identification, which is very much related to the above mentioned problems, models 
of heteropolymers are required: The distribution of A-T and C-G base pairs follows non- 
uniform, nonhomogeneous sequences obtained from genome analysis. The heterogeneous 
PB model is also being used for identifying relevant sites, such as promoters 0, 0] in viral 
sequences, and also for analyzing the thermal denaturation process jlo| . 

The main motivation of this work is the study of the effects of the sequence heterogeneity 
on the dynamics of the mechanical denaturation process. We began that research program 
by analyzing the Englander (basically, the sine-Gordon equation [2J) model. The results we 



obtained 
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1^ showed that the Englander model was much too simple to reproduce the 
jhenomena observed in experiments, and therefore we decided to focus on the PB model (see 
isl ] for an authoritative review about this model). In that context, our immediate aim was to 
obtain a tool in this model similar the effective potential proposed for the Englander model 



by Salerno and Kivshar 



i],Q,li3 i 



in order to study the relation between the dynamics of 



these excitations and gene identification. To that end, it is necessary to obtain stationary 
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states of the homogeneous model. Those were available for the continuous version of the 
PB model, but, in fact, DNA is quite a discrete system, and the discretization parameter 
depends on experimental measurements used as parameters in the model. For instance, in 
the PB model, the dimensionless parameter that defines the effective discretization of the 



system can go from R = 10.1 (see next section for a definition of R), used in [13, Q, to 
i? ~ 75 used in |18|, or even to R = 100 in In all cases, these R values correspond to 
systems that are far from the continuum limit. Therefore, as a first step towards our chief 
goal of understand sequence effects on denaturation, our immediate purpose is to study 
stable, stationary states in the discrete PB model, with a special focus on their dependence 
on the effective discretization. 

In this paper, we aim to finding stationary solutions of the PB model and their cor- 
responding stability conditions. These issues are addressed in Sees. |TT| Subsequently, we 
discuss the validity of the continuum limit and the domain wall approximation in Sec. IIIH 
while in the main part of the paper. Sec. IIV| we propose analytical approximations for 
the discrete case and compare our results with the ones obtained in [l2|. Finally, Sec. El 
concludes the paper by summarizing our main results and their possible implications. 



II. DISCRETE SOLUTIONS AND STABILITY 



In the following we will use the dimensionless PB model, defined by the hamiltonian 

n=0 

where V{Y) = (1 — e^^)^ is the Morse potential, that stands for the atraction between 
the two bases of a base pair, and i? is a positive, dimensionless constant that refers to the 
intensity of the coupling of two consecutive bases. This constant plays the role of an effective 
discretization, a = VR, so that R ^ 1 stands for a large discretization and i? -C 1 is the 
continuous limit. 

Static solutions of hamiltonian must satisfy dH/dYn = 0, which turns out to be the 
recurrence relation 

= 2F„ - r„_i + (2) 

for n = 1, 2, . . . , A^. These solutions are uniquely defined by the initial condition {Yq, Yi}. 
If we restrict ourselves to solutions with l^o = 0, which we can do without loss of generality, 
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then each Yn will only depend on the value Yi = y, so that Eq. Q can be rewritten in terms 
of y introducing the notation Yn{y) instead of Yn. From now on we will discuss the behavior 
of the solutions Yn{y) as a function of y. 

Equation (j2)) describes stable and unstable solutions of hamiltonian (P). In order to assess 
the stability properties of the solutions, we need to study the hessian of the system, 

/ di{y) -1 ... \ 
-1 d^iy) -1 ... 
-1 d^iy) ... 



-HNiy) 



(3) 



V . . . dN{y) J 

where dn{y) = 2 + RV"{Yn{y)), in order to find out the stability of solutions. Calling An{y) 
the determinant of the principal minor of order n of the hessian 'HN{y)^ i-e., 



A„(y) = det(7^„(y)). 



(4) 



a stable solution must satisfy ^n{y) > for all = 1,2, ...,A^. As the hessian is a 
tridiagonal matrix, there is a recursive relation between different A„, 



^n+iiy) = dn+i{y)^n{y) - ^n-i{y), 



(5) 



with Ai = di and A2 = did2 — 1. 

Expression (jSj) above can be rewritten in terms of Y^{y). By deriving expression (j2|) with 
respect to y we find: 

dYn+i{y) 



dy 



[2 + RV'iYM)] Yniy) - YLiiy) = dn{y)Y:iy) - (6) 



with Y2{y) = Ai(?/) and Y^{y) = A2{y). Therefore, it has to be 

AM = K^,iy), 



(7) 



for n = 1, 2, . . . , A^, and hence the stability region of solutions (j2I) are the points that satisfy 
Y;{y) > for all n = 2, 3, . . . , iV + 1. 

This far, no approximations where needed to obtain these results, still valid for any V{Y). 
From now on, we will focus on the PB model by choosing the Morse potential as our V{Y), in 
order to search for an analytic expression of the solutions Q, as well as to find the stability 
in terms of the initial condition y. 



III. CONTINUUM LIMIT OF THE PEYRARD-BISHOP MODEL 



This limit corresponds to taking i? <^ 1, which means that we can use the approximation 
Yn{y) — > Ycontix, y) with x = n\/R in Eq. (0). By so doing we obtain the following differential 
equation: 

d^Y^ont dV 



Ox dY(,Qfit 

which can be easily solved using the initial conditions Ycont{xo, y) = and d Yconti^o, y) / dx 
y/y/R^ where xq stands for the initial site of the model. This solution is 



e 



?/./|sinh 



Xo) + sinh-i 



2 + ^ 

R 



(9) 



for X > Xq. Looking at expression Q we see that, for a finite system, taking y = implies 
Ycont{x-, 0) = for all x > xq. This result differs from the domain wall obtained in the 
continuum limit in the PB model because in this case we have restricted ourselves to 
finite systems (x > Xq). For infinite systems, letting Ycont{xo,y) ^ as Xq — ^ oo, it can be 
seen that there is another stationary solution, namely 

In this respect, we believe that this solution should not be used as an approximation of 
a finite system because it is not a critical point of the continuum version of Eq. (Q) and, 
therefore, we cannot speak of stability in this case as long as we consider DNA as a finite 
lattice. 

An important feature of the continuum aproximation is that, in Eq. Q, Ycont can be 
written as a function of x and ^, where ^ = y/y/R. This implies a scaling relation between 
these two parameters, a relation that is absent for solutions of the discrete limit. This 
behavior can be seen in Fig. Q where we represent Ycont with respect to x (with xq = 0) 
for two values of ^, compared to the exact result Yn{y) (recall that x = n\/R) for different 
values of i?, and with y = ^VR. We clearly observe that for the largest values of i? (i? = 1 
and, mostly, R = 10) the scaling relation is not fulfilled, indicating the crossover to the 
discrete limit regime. 

To analize the stability of these solutions, using the result of section |Tll it is enough to 
study the sign of d Ycont{x , y) / dy for all x > 0. As the derivative of expression ^ is quite 




FIG. 1: Plots of Ycont{x,y) (obtained from expression ©) for ^ = 0.1 (left) and ^ = 1 (right), 
where ^ = y/VU, compared to the exact solutions Yn{y) of the discrete recurrence relation (j2j 
calculated for the same quotient ^ = y/VR but for different values of R (and, therefore, different 
values of y). 
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FIG. 2: Plots of d Ycont{x , y) / dy for (from lower to higher) x = 1, x = 5, x = W, x = 50 and 
X = 100. All of them are positive for any y > 0, so all solutions of the form © are stable for any 
y>0. 

cumbersome, we prefer to show plots of the result for different values of x, which we collect 
in Fig. 121 It can be shown in general that dYcont{x,y)/dy > for all a; > 0, and therefore 
Eq. © is a stable solution of (0). 

In order to check these results, we used the equations of motion of the model (|H) in order 
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to simulate the dynamics of initial data given by Q with small perturbations, and then 
monitored the evolution of this curves in time. Fixed boundary conditions at both ends of 
the simulated interval were used, in order to prevent the chain from spontaneously closing: 
We note that the global minimum of the hamiltonian (^Q) is the null solution. Therefore, 
in order to verify our results we had to restrict the simulations to the sector of open-chain 
solutions by choosing those boundary conditions. With that caveat, our simulations fully 
confirm the predicted stability of solutions. We stress that such solutions are the ones that 
are relevant to the mechanical denaturation problem, where the spontaneous closing of the 
chain is prevented by the force exerted on the open end. 



IV. DISCRETE LIMIT OF THE PEYRARD-BISHOP MODEL 



A. Solutions 



The discrete limit of the PB model corresponds to letting R ^ 1, and can be obtained 
following a few steps. Using a telescopic summation of l^n+i — 1^, and noting the initial 
conditions 10 = and Yi = y, Eq. Q can be rewritten as 

n 

= {n + l)y + Rj2in + l-k) V'{Yk{y)). (11) 

k=l 

We now define 

f,iy) ^ V'{Y,{y)) = 2e-^'=(^) (l - g-^'^^^)) = h{Y,{y)). (12) 

These functions are plotted for different values of R in the discrete limit in Fig. El As 
can be seen, these fk are very localized, their overlapping depending on R. In fact, in 
the discrete limit we are working on, which implies low overlapping of the curves, we can 
calculate the position of the maxima of each fk{y), and subsequently approximate fk{y) by 
the first function, fi{y), by writing 

My)c:,fi'\y)^Mhy), (13) 

with 



bn = = 

2^yR{R + 2) 



(^R+l + ^/R{R + 2)y - + Vi?(i? + 2))"] . (14) 



R= 10 



R= 100 




FIG. 3: Functions fniu) for i? = 10 (left) and R = 100 (right), for different values of n. 



R= 10 



R= 100 
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FIG. 4: Approximation Y^^\y) (see Eq. vs. exact Yn{y), for i? = 10 (left) and R = 

100 (right), for different values of n. Exact solutions are drawn with thick lines, whereas the 
corresponding approximations are drawn in thin, solid lines. 



This result allows us to obtain an analytic, approximate expression of the solutions for 
different y in the discrete limit. Substituting it in Eq. (fTTj) . we find that 

n 

yl'Mv) = {n + l)y + Rj2in + l- k) jf\y). (15) 

fe=i 

is a good approximation of the exact solution Yn{y) for large values of R. Figure 0] con- 
firms the accuracy of this approximation: for R > 100, the approximation is very accurate, 
whereas the smaller R the worse the approximation. For smaller values of R, the approxima- 
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R= 10 



R= 100 
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FIG. 5: Approximation yP [y) (see Eq. (HTJ) vs. exact Yn{y), for i? = 10 (left) and R = 
100 (right), for different values of n. Exact solutions are drawn with thick lines, whereas the 
corresponding approximations are drawn in thin, solid lines. 



tion can be improved by resorting to the next function, f2{y) instead of fi{y), as a substitute 
for the rest of the fk{y), by defining 

fk{y)^fi'\y)^f2{yh (16) 

for /c = 3, 4 . . ., and approximating Yn{y) by 

n 

Y^%{y) = {n+ l)y + nRh{y) + i?^(n + l-k) ff\y). (17) 

k=2 

In this case, the approximation is even better than for Yn^\y), and even for R = 10 the 
results are very close to the exact ones (see Fig. El for details). 

The errors of these approximations depend on the value of R and n. For instance, Yn^^ (y) 
is exact for n = 1 and n = 2, for any value of R, whereas Yn {y) is exact up to n = 3 for 
any R. For low values of n, the main difference between the exact /„(?/) (which can be easily 
obtained numerically) and fn^ is located around the maxima of fn{y), with a maximum error 
Emax — 0.06 for R = 10 and Emlx — 0.006 for R = 100. For the second order approximation 
based on fn \ the maximum error is Emlx — 2.710^'^ for _R = 10 and Emlx — 3.210^^ 
for R = 100; at the same time, another discrepant region, much less so than the main 
one, appears around the position of the maxima of /n-i(y). The same calculation can be 
done for higher orders of the approximating function, fn\y), and it can be seen that the 
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reduction of the error using A; + 1 instead of k is at least of one order of magnitude. There 
is a computational limit near the precision of the machine, which does not allow us to check 
the validity of this assumption further than a certain k and n, depending of the value of R, 
but, as far as we know, it is reasonable to expect that the same behavior will take place for 
higher values of k and n. Therefore, we conjecture that higher orders of functions fk{y) can 
be used as approximations of fn{y), as fn{y) — fn\y)-, with 



for n > fc, in order to obtain better approximations Yn \y) of the exact solution Yn{y)^ and 
that the error of an approximation of order fc, En\y) = Yn{y) — Yn''\y), can be estimated 
as the difference 



B. Stability 

The approximations defined in (jl5p and (jl7p . as well as the ones mentioned in the above 
section allow us to calculate very accurately the solution Yn{y) for any value of n and y. 
This is important because in the exact, numerical calculation of Eqs. (j21) and there are 
problems for values of y close to zero, due to the numerical precision of the computer (see 
also the discussion below). Therefore, for analyzing the stability in the discrete limit, we 
proceed to use the approximations Yn'^^ (y) previously discussed. By this means, we can work 
with systems of much larger size than the ones that could be studied solving numerically the 
original recurrence relations. For comparison, in the study of stability we will show results for 
systems of small size, where the derivative Y^{y) can be calculated without approximations 
for each n without high errors of the precision of the computer. In Fig. |B] we show the 
dependence of dYn{y) /dlogiQ^y) on function of the initial condition y, in logarithmic scale, 
for different values of the size of the system, A^. We chose dYn{y)/dlogiQ{y) instead of Y^{y) 
in order to obtain a smooth curve: The direct plot of Y^{y) would make very difficult to 
observe the intervals with Y^{y) > 0. As the sign of both derivatives is the same for ally > 0, 
we have resorted the logarithmic one. With this change, a modulated "sinusoidal" structure 
reveals itself in Fig. IHlfor each n, with n — 1 maxima and minima around Y^{y) = 0. From 




(18) 



Ei'\y) ^ Y^'+'\y) - Yi'\y) + O [E^^^'Ky)) , 



(19) 



withEr^\y)^E\:\y). 
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R=10 R=100 




FIG. 6: Derivative of Yn{y) with respect to logiQ{y) for different values of n and for R = 10 (left) 
and R = 100 (right), with logarithmic x axis. The stability region of a system of size N is the 
intersection of all the points that satisfy d Y/dy > for n = 1, 2, . . . , + 1. From the figures, we 
find that the stability region corresponds to the points that satisfy the condition for n = N + 1, as 
the stability region of a system of size N seems to be embedded in the stability region of a system 
of size — 1 (see text for an explanation) . 



that figure, it is apparent that a new interval of instability for lower values of y appears in 
systems of size n as compared to systems of size n — 1. In addition to this, Fig.lHlalso suggests 
that the set of unstable points of a system of size n containes the set of unstable points of a 
system of size n — 1. A plausibility argument for this statement goes as follows: Let us look 
at points that satisfy Y^{yo) = 0, in the extremes of an interval of unstable points. Then, if 
Y^_^{yo) > 0, it must be y^_^^(?/o) < (see Eq. ©), and therefore the interval of unstable 
points for a system of size n will be larger than for a system of size n — 1. This condition 
is satisfied by all the new unstable intervals that appear for each Yn{y), starting on Y2{y), 
and therefore, by induction, it can be applied to all systems. Therefore, all stable points of 
a system of size n are those who satisfy Y^^^{y) > 0. 

As an independent check of the validity of the results shown in this section, we compared 
our results with the ones recently reported in [l3|. By studying the discrete, stationary 
problem with fixed boundary conditions, they found eight stable and seven unstable solutions 
of a system of size N = 28. The specific boundary conditions they used were Yq = and 

(2) 

Yat = 80 for R = 10.1. The exact Yn{y) and the approximate solution Yn {y) of that 
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180 




(2) 

FIG. 7: Stable and unstable solutions given by approximation for a system with = 28 
sites, Yq = and Y/v = 80. The solutions are the intersections of the curves with the line 1^28 = 80. 
It is necessary to study the sign of l^'g on each solution in order to establish the stability of the 
solutions. Once again, 3Y29(y)/c^logio(y) is plotted for clarity. Our approximation gives eight 
stable and seven unstable solutions, exactly as in 



system are in Fig. [7| The plot now makes clear the precision problem we mentioned above, 
namely when we tried to calculate the exact, numerical solution for low values of y. On the 
other hand, the approximate solution Yn (y) was calculated without any problem in a wide 
range of y. It is also shown that Yn'^\y)gives the same number of both stable and unstable 
solutions as in (see explanation in caption), which implies that the structure of peaks of 
Yn{y) gives a good explanation of the number and structure of solutions. We think that this 
method can be applied for larger systems with the way to estimate errors that we explained 
in this section. 



V. CONCLUSIONS 



In this paper, we have reported a study of the stationary solutions of the PB model, 
obtaining exact and analytical approximations of the continuum and the discrete limit. We 
have been able to obtain all the stationary solutions and to classify them according to their 
stability by considering the stationary equation as an initial value problem. We have also 
found that, in the discrete limit, the exact solutions can be approximated to the desired 
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degree of accuracy by using the functions /fc(w) as explained above. We have compared 



our results obtained in the discrete limit with [12| finding a very good agreement with the 
number of stable and unstable solutions of a PB system with fixed boundary conditions, thus 
giving an explanation of the multiple solutions of the problem and the stability. In fact, our 
results show that every solution of the initial value problem, which is unique for every choice 
of y, corresponds to exactly one of the problem with fixed boundary conditions 17], which 
does not have a unique solution. This is the reason why the picture we are providing here 
is much more comprehensive and allows to understand fully the space of solutions of the 
problem. On the other hand, the method explained in this paper to obtain stable solutions 
of a system of size and opening L allows to work with larger systems, as solutions and 
their stability are calculated by evaluating a function, instead of numerically (as in 17[, with 
= 28). We also believe that this study may be extended to the more accurate description 
of DNA given by the Peyrard-Bishop-Dauxois model where the coupling between two 
consecutive bases of the DNA molecule has an anharmonic term that affects the general 
behavior of the openings ^, [l^ . 

As stated in the introduction, this stems from previous studies in the sine-Gordon (Eng- 
lander) model of DNA , where the relation between the dynamics of soliton-like 

excitations and the inhomogeneity of the DNA sequence was studied. In fact, what we are 
reporting here is only the first step towards the study of an effective potential that may 
explain the dynamics of these stationary, stable solutions in presence of heterogeneities in 
the sequence and an external force. Once an analytical expression of the stationary solu- 
tions of the model is found, as we have just done, we will resort to a collective coordinate 
technique to find an effective potential description of the dynamics. The final aim of such 
a program is to find out whether this approach allows to identify important sites from the 
genomic viewpoint along any given sequence. While work along these lines is in progress, we 
believe that the richness of the structure of the stationary solutions we have found and their 
stability is of interest in itself and can motivate further research in these and related models. 
Finally, we believe our solutions can be exploited to analyze the statistical mechanics of the 



PB model along the lines of 



17 



19|, in particular because of the advantage of having an 



approximate, analytical expression. 
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